Experiments in confidence scoring using Spanish CallHome data
نویسنده
چکیده
We present results relevant to tasks involved in the confidence scoring of output from a continuous speech recognition system, including the search for predictor variables and model selection. We introduce the DET curve characteristic (DCC) score, which we use along with the normalized cross entropy (NCE) score, to perform the model and predictor variable evaluation. We also show results from experiments that suggest how the NCE and DCC scores vary with recognizer performance.
منابع مشابه
Syllable-final /s/ lenition in the LDC's callhome Spanish corpus
This paper describes a data corpus which is being made available through the Linguistic Data Consortium (LDC) that codes lenition of syllable-final /s/ in Latin American Spanish in the LDC’s CallHome Spanish corpus. This lenition is a process whereby the /s/ may be aspirated (pronounced [h]) or deleted altogether. Since syllable-final /s/ is frequent in Spanish, lenition has a great effect on o...
متن کاملA probabilistic approach to confidence estimation and evaluation
In this paper we propose a novel way of estimating confidences for words that are recognized by a speech recognition system, together with a natural methodology for evaluating the overall quality of those confidence estimates. Our approach is based on an interpretation of a confidence as the probability that the corresponding recognized word is correct, and makes use of generalized linear model...
متن کاملImproved Speech-to-Text Translation with the Fisher and Callhome Spanish–English Speech Translation Corpus
Research into the translation of the output of automatic speech recognition (ASR) systems is hindered by the dearth of datasets developed for that explicit purpose. For SpanishEnglish translation, in particular, most parallel data available exists only in vastly different domains and registers. In order to support research on cross-lingual speech applications, we introduce the Fisher and Callho...
متن کاملMultilingual speech recognition: the 1996 byblos callhome system
This paper describes the 1996 Byblos Callhome speech recognition system for Spanish and Egyptian Colloquial Arabic. The system uses a combination of Phoneticly Tied-Mixture Gaussian HMMs and State-Clustered Tied-Mixture Gaussian HMMs in a multiple pass decoder. We focus here on the aspects of the system which are language specific and demonstrate the adaptability of the Byblos English system to...
متن کاملLatent Semantic Analysis for Dialogue Act Classification
This paper presents our experiments in applying Latent Semantic Analysis (LSA) to dialogue act classification. We employ both LSA proper and LSA augmented in two ways. We report results on DIAG, our own corpus of tutoring dialogues, and on the CallHome Spanish corpus. Our work has the theoretical goal of assessing whether LSA, an approach based only on raw text, can be improved by using additio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998